Search CORE

126 research outputs found

5,13-Disulfamoyl-1,9-diazatetracyclo[7.7.1.02,7.010,15]heptadeca-2(7),3,5,10,12,14-hexaen-1-ium chloride

Author: Jin Shouwen
Liu YingJia
Shi ChuanChuan
Xu Yichao
Zhu Jianlong
Publication venue: International Union of Crystallography
Publication date: 01/10/2011
Field of study

In the title salt, C15H17N4O4S2 +·Cl−, the chloride anion is disordered over two positions with occupancies of 0.776 (6) and 0.224 (6). The cation adopts an L shape and the dihedral angle between the benzene rings is 82.5 (3)°. In the crystal, inversion dimers of cations linked by pairs of N—H⋯N hydrogen bonds occur, with the bond arising from the protonated N atom. The cationic dimers are linked into chains via the disordered chloride ions by way of N—H⋯Cl hydrogen bonds and N—H⋯O, C—H⋯O and C—H⋯Cl interactions also occur, which help to consolidate the three-dimensional network

Directory of Open Access Journals

PubMed Central

Multi-foci metalens for terahertz polarization detection

Author: Chen Xianzhong
Han Jin
Li Li
Liu Jianlong
Sun Weimin
Tian Hao
Wang Ruoxing
Publication venue: 'The Optical Society'
Publication date: 01/07/2020
Field of study

Heriot Watt Pure

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Author: Fu Jianlong
Jin Chuhao
Liu Bei
Song Ruihua
Tan Wenhui
Wang Limin
Yang Jiange
Publication venue
Publication date: 09/06/2023
Field of study

Improving the generalization capabilities of general-purpose robotic agents has long been a significant challenge actively pursued by research communities. Existing approaches often rely on collecting large-scale real-world robotic data, such as the RT-1 dataset. However, these approaches typically suffer from low efficiency, limiting their capability in open-domain scenarios with new objects, and diverse backgrounds. In this paper, we propose a novel paradigm that effectively leverages language-grounded segmentation masks generated by state-of-the-art foundation models, to address a wide range of pick-and-place robot manipulation tasks in everyday scenarios. By integrating precise semantics and geometries conveyed from masks into our multi-view policy model, our approach can perceive accurate object poses and enable sample-efficient learning. Besides, such design facilitates effective generalization for grasping new objects with similar shapes observed during training. Our approach consists of two distinct steps. First, we introduce a series of foundation models to accurately ground natural language demands across multiple tasks. Second, we develop a Multi-modal Multi-view Policy Model that incorporates inputs such as RGB images, semantic masks, and robot proprioception states to jointly predict precise and executable robot actions. Extensive real-world experiments conducted on a Franka Emika robot arm validate the effectiveness of our proposed paradigm. Real-world demos are shown in YouTube (https://www.youtube.com/watch?v=1m9wNzfp_4E ) and Bilibili (https://www.bilibili.com/video/BV178411Z7H2/ )

arXiv.org e-Print Archive

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Author: Fu Jianlong
Jin Chuhao
Liu Bei
Song Ruihua
Tan Wenhui
Wang Limin
Yang Jiange
Publication venue
Publication date: 30/05/2023
Field of study

We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks. These tasks often involve complex multi-step reasoning, presenting significant challenges due to the limited paired data connecting human instructions (e.g., making a smiley face) and robot actions (e.g., end-effector movement). Existing approaches relieve this challenge by adopting an open-loop paradigm decomposing high-level instructions into simple sub-task plans, and executing them step-by-step using low-level control models. However, these approaches are short of instant observations in multi-step reasoning, leading to sub-optimal results. To address this issue, we propose to automatically collect a cognitive robot dataset by Large Language Models (LLMs). The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation sequences. To enable efficient data acquisition, we employ elaborated multi-round prompt designs that effectively reduce the burden of extensive human involvement. We further propose a closed-loop multi-modal embodied planning model that autoregressively generates plans by taking image observations as input. To facilitate effective learning, we leverage MiniGPT-4 with a frozen visual encoder and LLM, and finetune additional vision adapter and Q-former to enable fine-grained spatial perception for manipulation tasks. We conduct experiments to verify the superiority over existing open and closed-loop methods, and achieve a significant increase in success rate by 21.4% and 14.5% over ChatGPT and GPT-4 based robot tasks. Real-world demos are shown in https://www.youtube.com/watch?v=ayAzID1_qQk

arXiv.org e-Print Archive

3,5-Dimethyl-1H-pyrazole–2-hydroxy-5-(phenyldiazenyl)benzoic acid (1/1)

Author: Bernstein
Biswas
Chuan-Chuan Shi
Desiraju
Jianlong Zhu
Jin
Lam
Liu
Macrae
Sheldrick
Shouwen Jin
Yichao Xu
Ying-Jia Liu
Publication venue: International Union of Crystallography
Publication date: 01/09/2011
Field of study

There are two independent 3,5-dimethylpyrazole and two independent 2-hydroxy-5-(phenyldiazenyl)benzoic acid molecules [in which intramolecular O—H⋯O bonds form S(6) graph-set motifs] in the asymmetric unit of the title compound, C5H8N2·C13H10N2O3. In the crystal, the components are linked by intermolecular O—H⋯O, O—H⋯N and N—H⋯O hydrogen bonds, forming four-component clusters. Further stabilization is provided by weak C—H⋯π interactions

Crossref

Directory of Open Access Journals

PubMed Central

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Author: Fu Jianlong
Guo Baining
He Huiguo
Jin Qin
Liu Bei
Ma Yiyang
Ruan Ludan
Yang Huan
Yuan Nicholas Jing
Publication venue
Publication date: 24/03/2023
Field of study

We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design. Two subnets for audio and video learn to gradually generate aligned audio-video pairs from Gaussian noises. To ensure semantic consistency across modalities, we propose a novel random-shift based attention block bridging over the two subnets, which enables efficient cross-modal alignment, and thus reinforces the audio-video fidelity for each other. Extensive experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of 10k votes further demonstrate dominant preferences for our model. The code and pre-trained models can be downloaded at https://github.com/researchmm/MM-Diffusion.Comment: Accepted by CVPR 202

arXiv.org e-Print Archive